Arabic Parsing Using Grammar Transforms

نویسندگان

  • Lamia Tounsi
  • Josef van Genabith
چکیده

We investigate Arabic Context Free Grammar parsing with dependency annotation comparing lexicalised and unlexicalised parsers. We study how morphosyntactic as well as function tag information percolation in the form of grammar transforms (Johnson, 1998, Kulick et al., 2006) affects the performance of a parser and helps dependency assignment. We focus on the three most frequent functional tags in the Arabic Penn Treebank: subjects, direct objects and predicates . We merge these functional tags with their phrasal categories and (where appropriate) percolate case information to the non-terminal (POS) category to train the parsers. We then automatically enrich the output of these parsers with full dependency information in order to annotate trees with Lexical Functional Grammar (LFG) f-structure equations with produce f-structures, i.e. attribute-value matrices approximating to basic predicate-argument-adjunct structure representations. We present a series of experiments evaluating how well lexicalized, history-based, generative (Bikel) as well as latent variable PCFG (Berkeley) parsers cope with the enriched Arabic data. We measure quality and coverage of both the output trees and the generated LFG f-structures. We show that joint functional and morphological information percolation improves both the recovery of trees as well as dependency results in the form of LFG f-structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ArabTAG: a Tree Adjoining Grammar for Arabic Syntactic Structures

In order to construct a generic grammatical resource for Arabic language, we have chosen to develop an Arabic grammar based on TAG formalism. Our choice is, especially, justified by complementarities that we have noticed between Arabic syntax and this grammatical formalism. This paper consists of two comparative studies. The first is between a set of unification grammars. The second is between ...

متن کامل

A top-down chart parser for analyzing arabic sentences

Parsing of Arabic sentences is a necessary mechanism for many natural language processing applications such as machine translation; question answering, knowledge extraction and information retrieval. In this study, we present a top-down chart parser for parsing simple Arabic sentences, including nominal and verbal sentences within specific domain Arabic grammar. We used the Context Free Grammar...

متن کامل

Unlexicalised Hidden Variable Models of Split Dependency Grammars

This paper investigates transforms of split dependency grammars into unlexicalised context-free grammars annotated with hidden symbols. Our best unlexicalised grammar achieves an accuracy of 88% on the Penn Treebank data set, that represents a 50% reduction in error over previously published results on unlexicalised dependency parsing.

متن کامل

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computa...

متن کامل

Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms

This paper describes how to construct a finite-state machine (FSM) approximating a 'unification-based' grammar using a left-corner grammar transform. The approximation is presented as a series of grammar transforms, and is exact for left-linear and rightlinear CFGs, and for trees up to a user-specified depth of center-embedding. 1 I n t r o d u c t i o n This paper describes a method for approx...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010